Scalable Parallel Matrix Multiplication on Distributed Memory Parallel Computers
نویسنده
چکیده
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity O(N ), where 2 < 3. We show that such an algorithm can be parallelized on a distributed memory parallel computer (DMPC) in O(logN) time by using N = logN processors. Such a parallel computation is cost optimal and matches the performance of PRAM. Furthermore, our parallelization on a DMPC can be made fully scalable, that is, for all 1 p N = logN , multiplying twoN N matrices can be performed by a DMPC with p processors in O(N =p) time, i.e., linear speedup and cost optimality can be achieved in the range [1::N = logN ]. This unifies all known algorithms for matrix multiplication on DMPC, standard or non-standard, sequential or parallel. Extensions of our methods and results to other parallel systems are also presented. The above claims result in significant progress in scalable parallel matrix multiplication (as well as solving many other important problems) on distributed memory systems, both theoretically and practically.
منابع مشابه
Fast and Scalable Parallel Matrix
We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with reconngurable pipelined optical bus systems. These problems include computing the N th power, the inverse, the characteristic polynomial, the determinant, the rank, and an LU-and a QR-factorization of a matrix, and solving linear systems of equations. These c...
متن کاملFast and Scalable Parallel Matrix Computations with Optical Buses
We present fast and highly scalable parallel computations for a number of important and fundamental matrix problems on linear arrays with recon gurable pipelined optical bus systems. These problems include computing the Nth power, the inverse, the characteristic polynomial, the determinant, the rank, and an LUand a QR-factorization of a matrix, and solving linear systems of equations. These com...
متن کاملA New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure
The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...
متن کاملPumma: Parallel universal matrix multiplication algorithms on distributed memory concurrent computers
This paper describes the Parallel Universal Matrix Multiplication Algorithms (PUMMA) on distributed memory concurrent computers. The PIJhlMA package includes not only the non-transposed matrix multiplication routine C = A . B. but also transposed multiplication routines C = AT . B, C = A . BT, and C = AT . BT, for a block scattered data distribution. The routines perform efficiently for a wide ...
متن کاملFast matrix multiplication techniques based on the Adleman-Lipton model
Abstract. On distributed memory electronic computers, the implementation and association of fast parallel matrix multiplication algorithms has yielded astounding results and insights. In this discourse, we use the tools of molecular biology to demonstrate the theoretical encoding of Strassen’s fast matrix multiplication algorithm with DNA based on an n-moduli set in the residue number system, t...
متن کامل